Skip to content

ENH: maybe_convert_objects seen NaT speed-up #27300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 9, 2019
Merged

ENH: maybe_convert_objects seen NaT speed-up #27300

merged 4 commits into from
Jul 9, 2019

Conversation

BeforeFlight
Copy link
Contributor

@BeforeFlight BeforeFlight commented Jul 8, 2019

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an asv that hits this case

@BeforeFlight
Copy link
Contributor Author

BeforeFlight commented Jul 8, 2019

Will add. Should I add 'what's new' entry as well?

Also - in which file should I add asv - algorithms.py?

@jreback jreback added Performance Memory or execution speed performance Timedelta Timedelta data type labels Jul 8, 2019
@jreback
Copy link
Contributor

jreback commented Jul 8, 2019

Will add. Should I add 'what's new' entry as well?

yes that would be great; 0.25.0 performance section.

@BeforeFlight
Copy link
Contributor Author

$ asv continuous master maybe_convert_objects_ENH -f 1.1 -b algorithms.MaybeConvertObjects
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Building 56002cdd <maybe_convert_objects_ENH> for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.......................................................
·· Installing 56002cdd <maybe_convert_objects_ENH> into conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For pandas commit c64c9cb4 <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..........................................................
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.00%] ··· Running (algorithms.MaybeConvertObjects.time_maybe_convert_objects--).
[ 25.00%] · For pandas commit 56002cdd <maybe_convert_objects_ENH> (round 1/2):
[ 25.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 25.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.00%] ··· Running (algorithms.MaybeConvertObjects.time_maybe_convert_objects--).
[ 50.00%] · For pandas commit 56002cdd <maybe_convert_objects_ENH> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.00%] ··· algorithms.MaybeConvertObjects.time_maybe_convert_objects                                                                                                                                   17.1±0.3μs
[ 75.00%] · For pandas commit c64c9cb4 <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[100.00%] ··· algorithms.MaybeConvertObjects.time_maybe_convert_objects                                                                                                                                   17.5±0.8ms
       before           after         ratio
     [c64c9cb4]       [56002cdd]
     <master>         <maybe_convert_objects_ENH>
-      17.5±0.8ms       17.1±0.3μs     0.00  algorithms.MaybeConvertObjects.time_maybe_convert_objects

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@@ -939,7 +939,7 @@ Performance improvements
- Improved performance by removing the need for a garbage collect when checking for ``SettingWithCopyWarning`` (:issue:`27031`)
- For :meth:`to_datetime` changed default value of cache parameter to ``True`` (:issue:`26043`)
- Improved performance of :class:`DatetimeIndex` and :class:`PeriodIndex` slicing given non-unique, monotonic data (:issue:`27136`).

- Improved performance of :meth:`pandas._libs.lib.maybe_convert_objects` for the case when input contains ``NaT``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what hits this from a user perspective? This is a private method which we wouldn’t mention in a what’snew

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I write instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on what would touch this from user code. I see DataFrame.from_tuples and maybe GroupBy ops with datetimelike objects in the result - do you see a difference when using either of those?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best way is to run the entire asv suite (takes an hour or so)
and see what changes

@BeforeFlight
Copy link
Contributor Author

I believe imports should be in this order:

import pandas as pd
from pandas._libs import lib
from pandas.util import testing as tm

But when I run isort --recursive --check-only pandas locally it only prints Skipped 3 files without errors. Or am I checked it wrong?

@BeforeFlight
Copy link
Contributor Author

Also maybe add isort --recursive --check-only pandas to the initial list of TODO's of PRs? Along with black pandas and git diff upstream/master -u -- "*.py" | flake8 --diff.

@BeforeFlight
Copy link
Contributor Author

Full asv

asv continuous -f 1.1 master maybe_convert_objects_ENH

returns:

+      14.1±0.1ms       19.8±0.4ms     1.41  strings.Cat.time_cat(0, None, '-', 0.0)
+      2.92±0.1ms       4.06±0.3ms     1.39  algorithms.Quantile.time_quantile(0.5, 'midpoint', 'uint')
+        479±10μs         665±70μs     1.39  categoricals.Constructor.time_from_codes_all_int8
+     2.78±0.08ms       3.84±0.4ms     1.38  algorithms.Quantile.time_quantile(0.5, 'midpoint', 'int')
+      14.2±0.4ms       19.6±0.9ms     1.38  strings.Cat.time_cat(0, None, None, 0.0)
+      14.2±0.2ms       19.5±0.5ms     1.38  strings.Cat.time_cat(0, ',', None, 0.0)
+     4.62±0.07ms       6.30±0.4ms     1.37  algorithms.Quantile.time_quantile(0.5, 'midpoint', 'float')
+      14.3±0.3ms       19.4±0.7ms     1.35  strings.Cat.time_cat(0, ',', '-', 0.0)
+     2.47±0.08ms       3.31±0.3ms     1.34  algorithms.Quantile.time_quantile(1, 'nearest', 'float')
+      66.1±0.5ms        83.5±10ms     1.26  strings.Methods.time_rfind
+        729±60μs        913±200μs     1.25  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f6cf3692bf8>, False, 'int')
+     2.87±0.04ms       3.55±0.2ms     1.24  algorithms.Quantile.time_quantile(0.5, 'higher', 'uint')
+        894±30μs      1.10±0.02ms     1.23  indexing.NonNumericSeriesIndexing.time_getitem_pos_slice('string', 'unique_monotonic_inc')
+     9.73±0.09ms       11.7±0.9ms     1.21  series_methods.ValueCounts.time_value_counts('object')
+      7.08±0.1ms       8.46±0.2ms     1.20  timeseries.ResampleSeries.time_resample('datetime', '5min', 'mean')
+        814±80μs        964±200μs     1.18  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f6cf3692bf8>, True, 'int')
+      25.4±0.3ms         29.6±2ms     1.16  categoricals.Indexing.time_reindex
+      3.00±0.1ms       3.46±0.1ms     1.15  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'nearest')
+      4.01±0.2ms       4.51±0.7ms     1.12  ctors.SeriesConstructors.time_series_constructor(<function gen_of_tuples at 0x7f6cf36a9400>, False, 'int')
+         813±9μs         913±10μs     1.12  series_methods.IsInForObjects.time_isin_nans
+      2.60±0.1ms       2.92±0.2ms     1.12  ctors.SeriesConstructors.time_series_constructor(<class 'list'>, False, 'int')
+     2.66±0.07ms      2.97±0.06ms     1.12  ctors.SeriesConstructors.time_series_constructor(<class 'list'>, True, 'int')
+     4.86±0.04ms      5.42±0.09ms     1.12  timeseries.ToDatetimeISO8601.time_iso8601_format_no_sep
+         142±3μs          158±9μs     1.11  ctors.SeriesConstructors.time_series_constructor(<function no_change at 0x7f6cf3692b70>, True, 'int')
+        866±80μs         955±70μs     1.10  ctors.SeriesConstructors.time_series_constructor(<function list_of_str at 0x7f6cf3692bf8>, True, 'float')
-         262±3μs          238±9μs     0.91  groupby.GroupByMethods.time_dtype_as_group('float', 'all', 'direct')
-        575±10μs          521±6μs     0.91  groupby.GroupByMethods.time_dtype_as_group('object', 'head', 'direct')
-        91.5±5ms         82.8±1ms     0.90  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlalchemy', 'datetime')
-        641±10μs          580±3μs     0.90  groupby.GroupByMethods.time_dtype_as_group('float', 'tail', 'transformation')
-     1.68±0.04μs      1.52±0.05μs     0.90  period.PeriodProperties.time_property('min', 'hour')
-         336±5μs          303±4μs     0.90  offset.OffsetDatetimeIndexArithmetic.time_add_offset(<YearBegin: month=1>)
-         155±6ms        140±0.3ms     0.90  io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(False, 'custom')
-      50.4±0.7ms         45.3±1ms     0.90  groupby.Nth.time_frame_nth_any('object')
-        467±20μs          420±9μs     0.90  groupby.GroupByMethods.time_dtype_as_group('int', 'min', 'transformation')
-      1.65±0.1μs      1.49±0.02μs     0.90  period.PeriodProperties.time_property('min', 'minute')
-        353±30μs          317±8μs     0.90  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.UInt32Engine'>, <class 'numpy.uint32'>), 'non_monotonic')
-         216±2μs          193±6μs     0.90  groupby.GroupByMethods.time_dtype_as_group('object', 'size', 'transformation')
-     1.60±0.01ms      1.43±0.05ms     0.89  groupby.GroupByMethods.time_dtype_as_group('int', 'value_counts', 'direct')
-         491±6ns          439±5ns     0.89  timestamp.TimestampProperties.time_days_in_month(tzutc(), 'B')
-        172±20ms          153±2ms     0.89  io.json.ToJSON.time_delta_int_tstamp('split')
-        497±10ns          442±6ns     0.89  timestamp.TimestampProperties.time_days_in_month(<UTC>, 'B')
-        83.6±7μs         74.4±2μs     0.89  inference.ToNumeric.time_from_float('ignore')
-      7.96±0.1ms       7.08±0.1ms     0.89  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'kurt')
-      6.15±0.5μs      5.46±0.06μs     0.89  io.hdf.HDFStoreDataFrame.time_store_repr
-         178±7ms          158±2ms     0.89  io.sql.SQL.time_to_sql_dataframe('sqlalchemy')
-        432±10ms          384±3ms     0.89  io.json.ReadJSONLines.time_read_json_lines_concat('int')
-     1.71±0.02μs      1.52±0.01μs     0.89  period.PeriodProperties.time_property('M', 'year')
-         494±4ns          437±6ns     0.89  timestamp.TimestampProperties.time_days_in_month(None, None)
-        477±30μs          422±3μs     0.89  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int64Engine'>, <class 'numpy.int64'>), 'non_monotonic')
-      6.67±0.2ms       5.89±0.2ms     0.88  indexing.Take.time_take('int')
-         376±4ms          332±4ms     0.88  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'transformation')
-      3.45±0.1ms      3.04±0.07ms     0.88  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-      1.39±0.04s       1.23±0.01s     0.88  join_merge.MergeAsof.time_multiby('nearest')
-        821±10μs         724±20μs     0.88  groupby.GroupByMethods.time_dtype_as_group('float', 'sum', 'transformation')
-         462±4μs         407±20μs     0.88  groupby.GroupByMethods.time_dtype_as_group('float', 'last', 'transformation')
-      5.03±0.1ms      4.42±0.08ms     0.88  io.csv.ReadUint64Integers.time_read_uint64
-         280±4μs          246±5μs     0.88  groupby.GroupByMethods.time_dtype_as_group('float', 'count', 'direct')
-      3.33±0.2ms      2.92±0.08ms     0.88  io.csv.ReadCSVCachedParseDates.time_read_csv_cached(True)
-         375±9ms         328±10ms     0.88  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'direct')
-      1.69±0.1μs      1.48±0.03μs     0.88  period.PeriodProperties.time_property('M', 'dayofweek')
-         149±7μs          130±1μs     0.87  series_methods.NanOps.time_func('skew', 1000, 'int8')
-        639±30μs         558±10μs     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'nunique', 'transformation')
-      2.22±0.2ms      1.93±0.01ms     0.87  reshape.SparseIndex.time_unstack
-      2.82±0.2ms      2.46±0.03ms     0.87  reshape.SimpleReshape.time_unstack
-        937±20μs         818±10μs     0.87  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-         484±7μs          422±6μs     0.87  groupby.GroupByMethods.time_dtype_as_group('float', 'max', 'direct')
-         284±7μs          248±7μs     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'shift', 'transformation')
-      4.20±0.1ms      3.66±0.02ms     0.87  join_merge.Merge.time_merge_dataframe_integer_key(True)
-     2.52±0.08ms      2.19±0.04ms     0.87  io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', None)
-      1.70±0.1μs      1.47±0.02μs     0.87  period.PeriodProperties.time_property('min', 'month')
-         266±2μs          231±5μs     0.87  groupby.GroupByMethods.time_dtype_as_group('float', 'any', 'direct')
-         160±9ms          139±2ms     0.87  io.json.ToJSON.time_floats_with_dt_index_lines('split')
-        27.6±2ms       23.9±0.6ms     0.87  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlite', 'float')
-      12.8±0.9ms       11.1±0.2ms     0.87  reindex.LibFastZip.time_lib_fast_zip
-         194±2μs          168±4μs     0.87  indexing.NumericSeriesIndexing.time_ix_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-        283±20μs          245±5μs     0.86  groupby.GroupByMethods.time_dtype_as_group('float', 'shift', 'direct')
-     1.72±0.08μs      1.49±0.01μs     0.86  period.PeriodProperties.time_property('M', 'dayofyear')
-        971±70μs         839±10μs     0.86  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-         224±1μs          193±2μs     0.86  groupby.GroupByMethods.time_dtype_as_group('float', 'size', 'transformation')
-       654±200μs         563±20μs     0.86  groupby.GroupByMethods.time_dtype_as_group('float', 'tail', 'direct')
-        875±20μs          753±6μs     0.86  indexing.NonNumericSeriesIndexing.time_getitem_pos_slice('string', 'non_monotonic')
-     1.33±0.05ms      1.14±0.01ms     0.86  groupby.GroupByMethods.time_dtype_as_field('datetime', 'rank', 'direct')
-        47.5±6ms       40.8±0.9ms     0.86  plotting.TimeseriesPlotting.time_plot_regular_compat
-         217±5μs          186±1μs     0.86  groupby.GroupByMethods.time_dtype_as_field('datetime', 'size', 'direct')
-        32.7±1ms       28.0±0.9ms     0.86  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlite', 'float_with_nan')
-      4.00±0.3ms      3.42±0.03ms     0.86  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'mean')
-       1.33±0.1s       1.14±0.01s     0.86  join_merge.MergeAsof.time_multiby('forward')
-        654±40μs         558±20μs     0.85  groupby.GroupByMethods.time_dtype_as_group('float', 'quantile', 'direct')
-     1.47±0.09ms      1.25±0.06ms     0.85  groupby.GroupByMethods.time_dtype_as_group('object', 'value_counts', 'direct')
-     1.50±0.04ms      1.28±0.02ms     0.85  groupby.GroupByMethods.time_dtype_as_group('datetime', 'rank', 'transformation')
-     3.21±0.04ms      2.73±0.09ms     0.85  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-        499±20μs         424±10μs     0.85  groupby.GroupByMethods.time_dtype_as_group('float', 'min', 'transformation')
-     1.32±0.07ms      1.12±0.01ms     0.85  indexing.DataFrameNumericIndexing.time_bool_indexer
-        371±10μs          314±6μs     0.85  indexing.NumericSeriesIndexing.time_ix_scalar(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-        20.5±5μs       17.4±0.2μs     0.85  index_object.Indexing.time_slice_step('Float')
-        70.2±7μs         59.4±2μs     0.85  series_methods.NanOps.time_func('argmax', 1000, 'int32')
-        503±50μs          425±6μs     0.84  stat_ops.SeriesOps.time_op('mean', 'float', True)
-        413±20ms         348±10ms     0.84  io.stata.StataMissing.time_write_stata('tq')
-        205±10ms          173±3ms     0.84  io.json.ToJSON.time_float_int_str_lines('split')
-        27.8±1ms       23.4±0.2ms     0.84  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlite', 'string')
-        138±10ms          116±1ms     0.84  reshape.Cut.time_qcut_int(1000)
-        70.7±7μs         59.5±3μs     0.84  series_methods.NanOps.time_func('argmax', 1000, 'int8')
-        801±30μs          674±8μs     0.84  io.parsers.ConcatDateCols.time_check_concat('AAAA', 1)
-      2.37±0.2ms      2.00±0.03ms     0.84  groupby.GroupByMethods.time_dtype_as_group('int', 'pct_change', 'direct')
-        633±40μs          530±3μs     0.84  groupby.GroupByMethods.time_dtype_as_group('datetime', 'nunique', 'direct')
-     1.85±0.06μs      1.55±0.02μs     0.84  period.PeriodProperties.time_property('M', 'qyear')
-        599±10μs         501±20μs     0.84  groupby.GroupByMethods.time_dtype_as_group('object', 'tail', 'direct')
-      1.76±0.2μs      1.47±0.01μs     0.84  period.PeriodProperties.time_property('min', 'dayofyear')
-        42.1±5ms       35.1±0.9ms     0.83  io.sql.SQL.time_read_sql_query('sqlalchemy')
-      3.55±0.3ms      2.95±0.08ms     0.83  io.parsers.ConcatDateCols.time_check_concat(1234567890, 2)
-      9.72±0.9ms       8.06±0.5ms     0.83  series_methods.NanOps.time_func('std', 1000000, 'int32')
-        60.8±5ms       50.2±0.8ms     0.82  io.sql.ReadSQLTable.time_read_sql_table_all
-     1.56±0.06ms      1.28±0.02ms     0.82  groupby.GroupByMethods.time_dtype_as_group('float', 'rank', 'direct')
-        601±10μs         493±10μs     0.82  indexing.NumericSeriesIndexing.time_ix_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-     5.00±0.03ms      4.09±0.05ms     0.82  indexing.NonNumericSeriesIndexing.time_getitem_list_like('string', 'non_monotonic')
-        83.2±5ms         68.1±2ms     0.82  io.hdf.HDFStoreDataFrame.time_read_store_table_mixed
-        63.9±6ms         52.3±1ms     0.82  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlalchemy', 'string')
-         123±6ms          101±2ms     0.82  multiindex_object.Duplicated.time_duplicated
-      1.22±0.06s         992±10ms     0.81  join_merge.I8Merge.time_i8merge('left')
-        231±20μs          188±3μs     0.81  groupby.GroupByMethods.time_dtype_as_group('int', 'size', 'transformation')
-        134±10μs          109±3μs     0.81  series_methods.NanOps.time_func('argmax', 1000, 'float64')
-      8.49±0.1ms       6.88±0.2ms     0.81  io.sas.SAS.time_read_msgpack('xport')
-        289±10μs         233±10μs     0.81  groupby.GroupByMethods.time_dtype_as_group('float', 'shift', 'transformation')
-        225±20μs          181±4μs     0.81  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
-     1.10±0.06ms         890±10μs     0.81  indexing.NumericSeriesIndexing.time_ix_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
-        351±20μs          281±6μs     0.80  inference.NumericInferOps.time_add(<class 'numpy.uint8'>)
-        37.2±5ms       29.8±0.5ms     0.80  io.sql.SQL.time_read_sql_query('sqlite')
-        23.4±1ms       18.7±0.8ms     0.80  multiindex_object.Integer.time_get_indexer
-        156±10ms          124±2ms     0.79  io.sas.SAS.time_read_msgpack('sas7bdat')
-      3.63±0.1ms       2.87±0.1ms     0.79  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-         250±7ms          198±2ms     0.79  inference.ToNumericDowncast.time_downcast('string-int', None)
-     1.19±0.03ms         944±20μs     0.79  indexing.NumericSeriesIndexing.time_ix_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-       92.7±20μs       72.9±0.7μs     0.79  inference.ToNumeric.time_from_float('coerce')
-        24.3±3ms       19.1±0.4ms     0.79  groupby.Nth.time_frame_nth('float32')
-      11.7±0.7ms       9.07±0.3ms     0.78  reindex.DropDuplicates.time_frame_drop_dups_int(True)
-        74.9±6ms         58.3±5ms     0.78  io.parsers.DoesStringLookLikeDatetime.time_check_datetimes('0.0')
-        354±30μs          273±4μs     0.77  join_merge.Concat.time_concat_empty_left(0)
-      2.01±0.2ms      1.54±0.03ms     0.77  reindex.LevelAlign.time_align_level
-         108±4ms         80.6±8ms     0.75  io.parsers.DoesStringLookLikeDatetime.time_check_datetimes('10000')
-        83.2±2ms         58.8±1ms     0.71  io.sql.WriteSQLDtypes.time_to_sql_dataframe_column('sqlalchemy', 'bool')
-      23.0±0.4ms       16.2±0.5ms     0.70  indexing.NumericSeriesIndexing.time_ix_scalar(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-        48.7±1ms         33.6±2ms     0.69  io.msgpack.MSGPack.time_write_msgpack
-        442±30μs          297±6μs     0.67  indexing.NumericSeriesIndexing.time_ix_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
-      1.98±0.3ms      1.32±0.02ms     0.67  io.parsers.ConcatDateCols.time_check_concat(1234567890, 1)
-     1.97±0.09ms      1.31±0.04ms     0.66  groupby.GroupByMethods.time_dtype_as_group('float', 'sem', 'direct')
-      1.11±0.3ms         703±10μs     0.63  groupby.GroupByMethods.time_dtype_as_group('int', 'prod', 'transformation')
-        16.2±1ms       10.2±0.2ms     0.63  reindex.DropDuplicates.time_frame_drop_dups_na(True)
-        36.1±3ms       22.5±0.6ms     0.62  io.msgpack.MSGPack.time_read_msgpack
-        3.25±2μs      1.62±0.01μs     0.50  period.PeriodProperties.time_property('min', 'week')
-        3.02±1μs      1.50±0.05μs     0.50  period.PeriodProperties.time_property('min', 'qyear')
-      16.2±0.3ms         19.1±1μs     0.00  algorithms.MaybeConvertObjects.time_maybe_convert_objects

So there are (somehow) downsides of it (or I am using asv wrong, or interpret results of asv wrong) - need recheck it.

Btw full asv gets 3-4 hours without build for my laptop. So it is not fast here.

@WillAyd
Copy link
Member

WillAyd commented Jul 9, 2019

Yea I think there is some noise in there - do some of the regressions even hit this code?

@jreback
Copy link
Contributor

jreback commented Jul 9, 2019

ok so likely is this path is not hit in our asvs

so remove the whatsnew note and looks good

@@ -13,6 +15,19 @@
pass


class MaybeConvertObjects:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we even need this benchmark since it doesn't indicate anything from end user experience but up to @jreback

Break alone in PR lgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to add more tests here later as well - for more generalized cases.

@jreback jreback added this to the 0.25.0 milestone Jul 9, 2019
@jreback jreback merged commit 9240439 into pandas-dev:master Jul 9, 2019
@jreback
Copy link
Contributor

jreback commented Jul 9, 2019

thanks @BeforeFlight followups welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: maybe_convert_objects seen NaT speed-up
3 participants